\documentclass[DaoFP]{subfiles}
\begin{document}
\setcounter{chapter}{10}

\chapter{Algebras}

The essence of algebra is the formal manipulation of expressions. But what are expressions, and how do we manipulate them?

The first things to observe about algebraic expressions like $2 (x + y)$ or $a x^2 + b x + c$ is that there are infinitely many of them. There is a finite number of rules for making them, but these rules can be used in infinitely many combinations. This suggests that the rules are used \emph{recursively}. 


In programming, expressions are virtually synonymous to (parsing) trees. Consider this simple example of an arithmetic expression:
\begin{haskell}
data Expr = Val Int 
          | Plus Expr Expr
\end{haskell}
It's a recipe for building trees. We start with little trees using the \hask{Val} constructor. We then plant these seedlings into nodes, and so on.
\begin{haskell}
e2 = Val 2
e3 = Val 3
e5 = Plus e2 e3
e7 = Plus e5 e2
\end{haskell}

Such recursive definitions work perfectly well in a programming language. The problem is that every new recursive data structure would require its own library of functions that operate on it.

From type-theory point of view, we've been able to define recursive types, such as natural numbers or lists, by providing, in each case, specific introduction and elimination rules. What we need is something more general, a procedure for generating arbitrary recursive types from simpler pluggable components. 

There are two orthogonal concerns when it comes to recursive data structures. One is the machinery of recursion. The other is the pluggable components. 

We know how to work with recursion: We assume that we know how to construct small trees. We then use the recursive step to plant those trees into nodes to make bigger trees. 

Category theory tells us how to formalize this imprecise description.

\section{Algebras from Endofunctors}

The idea of planting smaller trees into nodes requires that we formalize what it means to have a data structure with holes---a ``container for stuff.'' This is exactly what functors are for. Because we want to use these functors recursively, they have to be \emph{endo}-functors.

For instance, the endofunctor from our earlier example would be defined by the following data structure, where \hask{x}'s mark the spots:
\begin{haskell}
data ExprF x = ValF Int 
             | PlusF x x
\end{haskell}
Information about all possible shapes of expressions is abstracted into one single functor. 

The other important piece of information when defining an algebra is the recipe for evaluating expressions. This, too, can be encoded using the same endofunctor. 

Thinking recursively, let's assume that we know how to evaluate all subtrees of a larger expression. Then the remaining step is to plug these results into the top level node and evaluate it. 

For instance, suppose that the \hask{x}'s in the functor were replaced by integers---the results of evaluation of the subtrees. It's pretty obvious what we should do in the last step. If the top of our tree is just a leaf \hask{ValF} (which means there were no subtrees to evaluate) we'll just return the integer stored in it. If it's a \hask{PlusF} node, we'll add the two integers in it. This recipe can be encoded as:
\begin{haskell}
eval :: ExprF Int -> Int
eval (ValF n)    = n
eval (PlusF m n) = m + n
\end{haskell}

We have made some seemingly obvious assumptions based on common sense. For instance, since the node was called \hask{PlusF} we assumed that we should add the two numbers. But multiplication or subtraction would work equally well.

Since the leaf \hask{ValF} contained an integer, we assumed that the expression should evaluate to an integer. But there is an equally plausible evaluator that pretty-prints the expression by converting it to a string. This evaluator uses concatenation instead of addition:
\begin{haskell}
pretty :: ExprF String -> String
pretty (ValF n)    = show n
pretty (PlusF s t) = s ++ " + " ++ t
\end{haskell}

In fact there are infinitely many evaluators, some sensible, others less so, but we shouldn't be judgmental. Any choice of the target type and any choice of the evaluator should be equally valid. This leads to the following definition:

An \emph{algebra} for an endofunctor $F$ is a pair $(c, \alpha)$. The object $c$ is called the \emph{carrier} of the algebra, and the evaluator $\alpha \colon F c \to c$ is called the \emph{structure map}.

In Haskell, given the functor \hask{f} we define:
\begin{haskell}
type Algebra f c = f c -> c
\end{haskell}

Notice that the evaluator is \emph{not} a polymorphic function. It's a specific choice of a function for a specific type \hask{c}. There may be many choices of the carrier types and there be many different evaluators for a given type. They all define separate algebras.

We have previously defined two algebras for \hask{ExprF}. This one has \hask{Int} as a carrier:
\begin{haskell}
eval :: Algebra ExprF Int
eval (ValF n)   = n
eval (PlusF m n) = m + n
\end{haskell}
and this one has \hask{String} as a carrier:
\begin{haskell}
pretty :: Algebra ExprF String
pretty (ValF n)   = show n
pretty (PlusF s t) = s ++ " + " ++ t
\end{haskell}

\section{Category of Algebras}

Algebras for a given endofunctor $F$ form a category. An arrow in that category is an algebra morphism, which is a structure-preserving arrow between their carrier objects. 

Preserving structure in this case means that the arrow must commute with the two structure maps. This is where functoriality comes into play. To switch from one structure map to another, we have to be able to lift an arrow that goes between their carriers. 

Given an endofunctor $F$, an \emph{algebra morphism} between two algebras $(a, \alpha)$ and $(b, \beta)$ is an arrow $f \colon a \to b$ that makes this diagram commute:
\[
 \begin{tikzcd}
 F a 
 \arrow[r, "F f"]
 \arrow[d, "\alpha"]
 & F b
\arrow[d, "\beta"]
 \\
 a
 \arrow[r, "f"]
 & b
  \end{tikzcd}
\]
In other words, the following equation must hold:
\[f \circ \alpha = \beta \circ F f \]

The composition of two algebra morphisms is again an algebra morphism, which can be seen by pasting together two such diagrams (a functor maps composition to composition). The identity arrow is also an algebra morphism, because 
\[ id_a \circ \alpha = \alpha \circ F (id_a) \]
(a functor maps identity to identity).

The commuting condition in the definition of an algebra morphism is very restrictive. Consider for instance a function that maps an integer to a string. In Haskell there is a \hask{show} function (actually, a method of the \hask{Show} class) that does it. It is \emph{not} an algebra morphism from \hask{eval} to \hask{pretty}. 

\begin{exercise}
Show that \hask{show} is not an algebra morphism. Hint: Consider what happens to the \hask{PlusF} node.
\end{exercise}

\subsection{Initial algebra}

The initial object in the category of algebras for a given functor $F$ is called the \emph{initial algebra} and, as we'll see, it plays a very important role.

By definition, the initial algebra $(i, \iota)$ has a unique algebra morphism $f$ from it to any other algebra $(a, \alpha)$. Diagrammatically:

\[
 \begin{tikzcd}
 F i 
 \arrow[r, "F f"]
 \arrow[d, "\iota"]
 & F a
\arrow[d, "\alpha"]
 \\
 i
 \arrow[r, dashed, "f"]
 & a
  \end{tikzcd}
\]
 This unique morphism is called a \emph{catamorphism} for the algebra $(a, \alpha)$. It is sometimes written using \index{banana brackets}\index{$\llparenthesis \rrparenthesis$}``banana brackets'' as $\llparenthesis \alpha \rrparenthesis$.


\begin{exercise}
Let's define two algebras for the following functor:
\begin{haskell}
data FloatF x = Num Float | Op x x
    deriving Functor
\end{haskell}
The first algebra:
\begin{haskell}
addAlg :: Algebra FloatF Float
addAlg (Num x) = log x
addAlg (Op x y) = x + y
\end{haskell}
The second algebra:
\begin{haskell}
mulAlg :: Algebra FloatF Float
mulAlg (Num x) = x
mulAlg (Op x y) = x * y
\end{haskell}
Make a convincing argument that \hask{log} (logarithm) is an algebra morphism between these two. (\hask{Float} is a built-in floating-point number type.)
\end{exercise}

\section{Lambek's Lemma and Fixed Points}


Lambek's lemma says that the structure map $\iota$ of the initial algebra is an isomorphism. 

The reason for it is the self-similarity of algebras. You can lift any algebra $(a, \alpha)$ using $F$, and the result $(F a, F \alpha)$ is also an algebra with the structure map $F \alpha \colon F (F a) \to F a$. 

In particular, if you lift the initial algebra $(i, \iota)$, you get a new algebra with the carrier $F i$ and the structure map $F \iota \colon F (F i) \to F i$. It follows then that there must be a unique algebra morphism from the initial algebra to it:
\[
 \begin{tikzcd}
 F i 
 \arrow[r, "F h"]
 \arrow[d, "\iota"]
 & F (F i)
\arrow[d, "F \iota"]
 \\
 i
 \arrow[r, dashed, "h"]
 & F i
  \end{tikzcd}
\]
This $h$ is the inverse of $\iota$. To see that, let's consider the composition $\iota \circ h$. It is the arrow at the bottom of the following diagram
\[
 \begin{tikzcd}
 F i 
 \arrow[r, "F h"]
 \arrow[d, "\iota"]
 & F (F i)
\arrow[d, "F \iota"]
\arrow[r, "F \iota"]
& F i
\arrow[d, "\iota"]
 \\
 i
 \arrow[r, dashed, "h"]
 & F i
 \arrow[r, "\iota"]
 & i
  \end{tikzcd}
\]
This is a pasting of the original diagram with a trivially commuting diagram. Therefore the whole rectangle commutes. We can interpret this as $\iota \circ h$ being an algebra morphism from $(i, \iota)$ to itself. But there already is such an algebra morphism---the identity. So, by uniqueness of the mapping out from the initial algebra, these two must be equal:
\[ \iota \circ h = id_i \] 

Knowing that, we can now go back to the previous diagram, which states that:
\[ h \circ \iota = F \iota \circ F h \]
Since $F$ is a functor, it maps composition to composition and identity to identity. Therefore the right hand side is equal to:
\[ F (\iota \circ h) = F (id_i) = id_{F i} \]

We have thus shown that $h$ is the inverse of $\iota$, which means that $\iota$ is an isomorphism. In other words:
\[ F i \cong i \]
We interpret this identity as stating that $i$ is a fixed point of $F$ (up to isomorphism). The action of $F$ on $i$ ``doesn't change it.''

There may be many fixed points, but this one is the \emph{least fixed point} because there is an algebra morphism from it to any other fixed point. The least fixed point of an endofunctor $F$ is denoted $\mu F$, so we write:
\[ i = \mu F \]
\subsection{Fixed point in Haskell}
Let's consider how the definition of the fixed point works with our original example given by the endofunctor:
\begin{haskell}
data ExprF x = ValF Int | PlusF x x
\end{haskell}
Its fixed point is a data structure defined by the property that \hask{ExprF} acting on it reproduces it. If we call this fixed point \hask{Expr}, the fixed point equation becomes  (in pseudo-Haskell):
\begin{haskell}
Expr = ExprF Expr
\end{haskell}
Expanding \hask{ExprF} we get:
\begin{haskell}
Expr = ValF Int | PlusF Expr Expr
\end{haskell}
Compare this with the recursive definition (actual Haskell):
\begin{haskell}
data Expr = Val Int | Plus Expr Expr
\end{haskell}
We get a recursive data structure as a solution to the fixed-point equation.

In Haskell, we can define a fixed point data structure for any functor (or even just a type constructor). As we'll see later, this doesn't always give us the carrier of the initial algebra. It only works for those functors that have the ``leaf'' component.

Let's call \hask{Fix f} the fixed point of a functor \hask{f}. Symbolically, the fixed-point equation can be written as:
\[f ( \text{Fix} f) \cong  \text{Fix} f \]
or, in code,
\begin{haskell}
data Fix f where
  In :: f (Fix f) -> Fix f
\end{haskell}
The data constructor \hask{In} is exactly the structure map of the initial algebra whose carrier is \hask{Fix f}. Its inverse is:
\begin{haskell}
out :: Fix f -> f (Fix f)
out (In x) = x
\end{haskell}
The Haskell standard library contains a more idiomatic definition:
\begin{haskell}
newtype Fix f = Fix { unFix :: f (Fix f) }
\end{haskell}
This definition reflects the isomorphism: \hask{unFix} being the inverse of the data constructor \hask{Fix}.

To create terms of the type \hask{Fix f} we often use ``smart constructors.'' For instance, with the \hask{ExprF} functor, we would define:
\begin{haskell}
val :: Int -> Fix ExprF
val n = Fix (ValF n)

plus :: Fix ExprF -> Fix ExprF -> Fix ExprF
plus e1 e2 = Fix (PlusF e1 e2)
\end{haskell}
and use it to generate expression trees like this one:
\begin{haskell}
e9 :: Fix ExprF
e9 = plus (plus (val 2) (val 3)) (val 4)
\end{haskell}

\section{Catamorphisms}

Our goal, as programmers, is to be able to perform a computation over a recursive data structure---to ``fold'' it. We now have all the ingredients. 

The data structure is defined as a fixed point of a functor. An algebra for this functor defines the operation we want to perform. We've seen the fixed point and the algebra combined in the following diagram:
\[
 \begin{tikzcd}
 F i 
 \arrow[r, "F f"]
 \arrow[d, "\iota"]
 & F a
\arrow[d, "\alpha"]
 \\
 i
 \arrow[r, dashed, "f"]
 & a
  \end{tikzcd}
\]
that defines the catamorphism $f$ for the algebra $(a, \alpha)$.

The final piece of information is the Lambek's lemma, which tells us that $\iota$ could be inverted because it's an isomorphism. It means that we can read this diagram as:
\[ f = \alpha \circ F f \circ \iota^{-1} \]
and interpret it as a recursive definition of $f$. 

Let's redraw this diagram using Haskell notation. The catamorphism depends on the algebra so, for the algebra with the carrier \hask{a} and the evaluator \hask{alg}, we'll have the catamorphism \hask{cata alg}.

\[
 \begin{tikzcd}
  \hask{f (Fix f)}
 \arrow[rrrr, "\hask{fmap (cata alg)}"]
 &&&& \hask{f a}
\arrow[d, "\hask{alg}"]
 \\
 \hask{Fix f}
 \arrow[u, "\hask{unFix}"]
 \arrow[rrrr, dashed, "\hask{cata alg}"]
 &&&& \hask{a}
  \end{tikzcd}
\]
By simply following the arrows, we get this recursive definition:
\begin{haskell}
cata :: Functor f => Algebra f a -> Fix f -> a
cata alg = alg . fmap (cata alg) . unFix
\end{haskell}

Here's what's happening: We apply this definition to some \hask{Fix f}. Every \hask{Fix f} is obtained by applying the data constructor \hask{Fix} to a functorful of \hask{Fix f}:
\begin{haskell}
Fix :: f (Fix f) -> Fix f
\end{haskell}
The function \hask{unFix} ``strips'' the data constructor \hask{Fix}:
\begin{haskell}
unFix :: Fix f -> f (Fix f)
unFix (Fix x) = x
\end{haskell}

We can now evaluate the functorful of \hask{Fix f} by \hask{fmap}'ing \hask{cata alg} over it. This is a recursive application. The idea is that the trees inside the functor are smaller than the original tree, so the recursion eventually terminates. It terminates when it hits the leaves. 

After this step, we are left with a functorful of values, and we apply the evaluator \hask{alg} to it, to get the final result.

The power of this approach is that all the recursion is encapsulated in one data type and one library function: We have the definition of \hask{Fix} and the catamorphism \hask{cata}. The client of the library provides only the \emph{non-recursive} pieces: the functor and the algebra. These are much easier to deal with. We have decomposed a complex problem into simpler components.

\subsection{Examples}

We can immediately apply this construction to our earlier examples. You can check that:
\begin{haskell}
cata eval e9
\end{haskell}
evaluates to $9$ and
\begin{haskell}
cata pretty e9
\end{haskell}
evaluates to the string \hask{"2 + 3 + 4"}.

Sometimes we want to display the tree on multiple lines with indentation. This requires passing a depth counter to recursive calls. There is a clever trick that uses a function type as a carrier:
\begin{haskell}
pretty' :: Algebra ExprF (Int -> String)
pretty' (ValF n) i = indent i ++ show n
pretty' (PlusF f g) i = f (i + 1) ++ "\n" ++
                        indent i ++ "+" ++ "\n" ++
                        g (i + 1)
\end{haskell}
The auxiliary function \hask{indent} replicates the space character:
\begin{haskell}
indent n = replicate (n * 2) ' '
\end{haskell}
The result of:
\begin{haskell}
cata pretty' e9 0
\end{haskell}
when printed, looks like this:
\begin{haskell}
    2
  +
    3
+
  4
\end{haskell}


Let's try defining algebras for other familiar functors. The fixed point of the \hask{Maybe} functor:
\begin{haskell}
data Maybe x = Nothing | Just x
\end{haskell}
after some renaming, is equivalent to the type of natural numbers
\begin{haskell}
data Nat = Z | S Nat
\end{haskell}
An algebra for this functor consists of a choice of the carrier \hask{a} and an evaluator:
\begin{haskell}
alg :: Maybe a -> a
\end{haskell}
The mapping out of \hask{Maybe} is determined by two things: the value corresponding to \hask{Nothing} and a function \hask{a->a} corresponding to \hask{Just}. In our discussion of the type of natural numbers we called these \hask{init} and \hask{step}. We can now see that the elimination rule for \hask{Nat} is the catamorphism for this algebra.

\subsection{Lists as initial algebras}

The list type that we've seen previously is equivalent to a fixed point of the following functor, which is parameterized by the type of the list contents \hask{a}:
\begin{haskell}
data ListF a x = NilF | ConsF a x
\end{haskell}
An algebra for this functor is a mapping out:
\begin{haskell}
alg :: ListF a c -> c
alg NilF = init
alg (ConsF a c) = step (a, c)
\end{haskell}
which is determined by the value \hask{init} and the function \hask{step}:
\begin{haskell}
init :: c
step :: (a, c) -> c
\end{haskell}
A catamorphism for such an algebra is the list recursor:
\begin{haskell}
recList :: c -> ((a, c) -> c) -> (List a -> c)
\end{haskell}
where \hask{(List a)} can be identified with the fixed point \hask{Fix (ListF a)}.

We've seen before a recursive function that reverses a list. It was implemented by appending elements to the end of a list, which is very inefficient. It's easy to rewrite this function using a catamorphism, but the problem remains. 

Prepending elements, on the other hand, is cheap. A better algorithm would traverse the list, accumulating elements in a first-in-first-out queue, and then pop them one-by-one and prepend them to a new list. 

The queue regimen can be implemented using composition of closures: each closure is a function that remembers its environment. Here's the algebra whose carrier is a function type:
\begin{haskell}
revAlg :: Algebra (ListF a) ([a]->[a])
revAlg NilF = id
revAlg (ConsF a f) = \as -> f (a : as)
\end{haskell}
At each step, this algebra creates a new function. This function, when executed, will apply the previous function \hask{f} to a list, which is the result of prepending the current element \hask{a} to the function's argument \hask{as}. The resulting closure remembers the current element \hask{a} and the previous function \hask{f}.

The catamorphism for this algebra accumulates a queue of such closures. To reverse a list, we apply the result of the catamorphism for this algebra to the empty list:
\begin{haskell}
reverse :: Fix (ListF a) -> [a]
reverse as = (cata revAlg as) []
\end{haskell}
This trick is at the core of the fold-left function, \hask{foldl}. Care should be taken when using it, because of the danger of stack overflow. 

Lists are so common that their eliminators (called ``folds'') are included in the standard library. But there are infinitely many possible recursive data structures, each generated by its own functor, and we can use the same catamorphism on all of them.

It's worth mentioning that the list construction works in any monoidal category with coproducts. We can replace the list functor with the more general:
\[ F x = I + a \otimes x \]
where $I$ is the unit object and $\otimes$ is the tensor product. The solution to the fixed point equation:
\[ L_a \cong I + a \otimes L_a \]
can be formally written as a series:
\[ L_a = I + a + a \otimes a + a \otimes a \otimes a + ... \]
We interpret this as a definition of a list, which can be empty $I$, a singleton $a$, a two-element list $a \otimes a$ and so on.

Incidentally, if you squint hard enough, this solution can be obtained by following a sequence of formal transformations:
\begin{align*}
L_a &\cong I + a \otimes L_a
\\
L_a - a \otimes L_a &\cong I
\\
(I - a) \otimes L_a &\cong I
\\
L_a &\cong I / (I - a)
\\
L_a &\cong I + a + a \otimes a + a \otimes a \otimes a + ...
\end{align*}
where the last step uses the formula for the sum of the geometric series. Admittedly, the intermediate steps make no sense, since there is no subtraction or division defined on objects, yet the final result make sense and, as we'll see later, it may be made rigorous by considering a colimit of a chain of objects.

\section{Initial Algebra from Universality}

Another way of looking at the initial algebra, at least in $\mathbf{Set}$, is to view it as a collection of catamorphisms that, as a whole, hint at the existence of an underlying object. Instead of seeing $\mu F$ as a set of trees, we can look at it as a set of functions from algebras to their carriers. 

In a way, this is just another manifestation of the Yoneda lemma: every data structure can be described either by mappings in or mappings out. The mappings in, in this case, are the constructors of the recursive data structure. The mappings out are all the catamorphisms that can be applied to it.

First, let's make the polymorphism in the definition of \hask{cata} explicit and use pattern matching instead of \hask{unFix}:
\begin{haskell}
cata :: Functor f => forall a. Algebra f a -> Fix f -> a
cata alg (Fix x) = alg (fmap (cata alg) x)
\end{haskell}
and then flip the arguments. We get:
\begin{haskell}
cata' :: Functor f => Fix f -> forall a. Algebra f a -> a
cata' (Fix x) = \alg -> alg (fmap (flip cata' alg) x)
\end{haskell}
The function \hask{flip'} reverses the order of arguments to a function:
\begin{haskell}
    flip' :: (c -> forall a. Algebra f a -> a) -> 
             (forall a. Algebra f a -> c -> a)
    flip' g a c = g c a
\end{haskell}
This gives us a mapping from \hask{Fix f} to a set of polymorphic functions.

Conversely, given a polymorphic function of the type:
\begin{haskell}
forall a. Algebra f a -> a
\end{haskell}
we can reconstruct \hask{Fix f}:
\begin{haskell}
uncata :: Functor f => (forall a. Algebra f a -> a) -> Fix f
uncata alga = alga Fix
\end{haskell}
In fact, these two functions, \hask{cata'} and \hask{uncata}, are the inverse of each other, establishing the isomorphism between \hask{Fix f} and the type of polymorphic functions:
\begin{haskell}
newtype Mu f = Mu (forall a. Algebra f a -> a)
\end{haskell}
We can now substitute \hask{Mu f} everywhere we used  \hask{Fix f}.

Folding over \hask{Mu f} is easy, since \hask{Mu} carries in itself its own set of catamorphisms:
\begin{haskell}
cataMu :: Algebra f a -> (Mu f -> a)
cataMu alg (Mu h) = h alg
\end{haskell}

You might be wondering how one can construct terms of the type \hask{Mu f} for, let's say, a list. It can be done using recursion:
\begin{haskell}
fromList :: forall a. [a] -> Mu (ListF a)
fromList as = Mu h
  where h :: forall x. Algebra (ListF a) x -> x
        h alg = go as
          where
            go [] = alg NilF
            go (n: ns) = alg (ConsF n (go ns))
\end{haskell}

\begin{exercise}
Write a test that takes a list of integers, converts it to the \hask{Mu} form, and calculates the sum using \hask{cataMu}.
\end{exercise}


\section{Initial Algebra as a Colimit}

In general, there is no guarantee that the initial object in the category of algebras exists. But if it exists, Lambek's lemma tells us that it's a fixed point of the endofunctor for those algebras.  The construction of this fixed point is a little mysterious, since it involves tying the recursive knot. 

Loosely speaking, the fixed point is reached after we apply the functor infinitely many times. Then, applying it once more won't change anything. Infinity plus one is still infinity. This idea can be made precise if we take it one step at a time. For simplicity, let's consider algebras in the category of sets, which has all the nice properties.

We've seen, in our examples, that building instances of recursive data structures always starts with the leaves. The leaves are the parts in the definition of the functor that don't depend on the type parameter: the \hask{NilF} of the list, the \hask{ValF} of the tree, the \hask{Nothing} of the \hask{Maybe}, etc. 

We can tease them out if we apply our functor $F$ to the initial object---the empty set $0$. Since the empty set has no elements, the instances of the type $F 0$ are leaves only. 

Indeed, the only inhabitant of the type \hask{Maybe Void} is constructed using \hask{Nothing}. The only inhabitants of the type \hask{ExprF Void} are \hask{ValF n}, where \hask{n} is an \hask{Int}.

In other words, $F 0$ is the ``type of leaves'' for the functor $F$. Leaves are trees of depth one. For the \hask{Maybe} functor there's only one. The type of leaves for this functor is a singleton:
\begin{haskell}
m1 :: Maybe Void
m1 = Nothing
\end{haskell}

In the second iteration, we apply $F$ to the leaves from the previous step and get trees of depth at most two. Their type is $F(F 0)$. 

For instance, these are all the terms of the type \hask{Maybe(Maybe Void)}:
\begin{haskell}
m2, m2' :: Maybe (Maybe Void)
m2 = Nothing
m2' = Just Nothing
\end{haskell}

We can continue this process, adding deeper and deeper trees at each step. In the $n$-th iteration, the type $F^n 0$ ($n$-fold application of $F$ to the initial object) describes all trees of depth up to $n$. However, for every $n$, there are still infinitely many trees of depth greater than $n$ that are not covered. 

If we knew how to define $F^{\infty} 0$, we would have covered all possible trees. The next best thing that we could try is to add up all those partial trees and construct an infinite sum type. Just like we have defined sums of two types, we can define sums of many types, including infinitely many. 

An infinite sum (a coproduct):
$$ \sum_{n = 0}^{\infty} F^n 0$$
is just like a finite sum, except that it has infinitely many constructors $i_n$:
\[
 \begin{tikzcd}
 0
 \arrow[ddrr, "i_0"']
 & F 0
  \arrow[ddr, "i_1"]
& F (F 0)
  \arrow[dd, "i_2"]
 & ...
 & F^n 0
  \arrow[ddll, red, "i_n"]
 & ...
 \\
 \\
 &&\sum_{n = 0}^{\infty} F^n 0
  \end{tikzcd}
\]
It has the universal mapping-out property, just like the sum of two types, only with infinitely many cases. (Obviously, we can't express it in Haskell.)

To construct a tree of depth $n$, we would first select it from $F^n 0$ and use the $n$-th constructor $i_n$ to inject it into the sum. 

There is just one problem: the same tree shape can also be constructed using any of the $F^m 0$, for $m > n$. 

Indeed, we've seen the leaf \hask{Nothing} appear in \hask{Maybe Void} and \hask{Maybe(Maybe Void)}. In fact it shows up in any nonzero power of \hask{Maybe} acting on \hask{Void}. 

Similarly, \hask{Just Nothing} shows up in all powers starting with two.

\hask{Just(Just(Nothing))} shows up in all powers starting with three, and so on...

But there is a way to get rid of all these duplicates. The trick is to replace the sum by a colimit. Instead of a diagram consisting of discrete objects, we can construct a chain (such chains are called $\omega$-chains). Let's call this chain $\Gamma$, and its colimit $i$:
\[i = \text{Colim} \, \Gamma \]

\[
 \begin{tikzcd}
 0
 \arrow[r, "\mbox{!`}"]
 \arrow[ddrr, "i_0"']
 & F 0
  \arrow[r, "F\mbox{!`}"]
 \arrow[ddr, "i_1"]
& F (F 0)
  \arrow[r, "F(F\mbox{!`})"]
  \arrow[dd, "i_2"]
 & ...
 \arrow[r]
 & F^n 0
  \arrow[r]
 \arrow[ddll, "i_n"]
 & ...
 \\
 \\
 &&i
  \end{tikzcd}
\]
It's almost the same as the sum, but with additional arrows at the base of the cocone. These arrows are the cumulative liftings of the unique arrow $\mbox{!`}$ that goes from the initial object to $F 0$ (we called it \hask{absurd} in Haskell). The effect of these arrows is to collapse the set of infinitely many copies of the same tree down to just one representative. 

To see that, consider for instance a tree of depth $3$. It can be first found as an element of $F^3 0$, that is to say, as an arrow $t \colon 1 \to F^3 0$. It is injected into the colimit $i$ as the composite $i_3 \circ t$. 

\[
 \begin{tikzcd}
 ...
& 1
\arrow[d, red, "t"]
\arrow[dr, "t'"]
 \\
 ...
 \arrow[r]
 & F^3 0 
 \arrow[r,  blue, "F^3 (\mbox{!`})"]
 \arrow[d, blue, "i_3"]
 & F^4 0
 \arrow[dl, blue, "i_4"]
 \\
 &  i
  \end{tikzcd}
\]
The same shape of a tree is also found in $F^4 0$, as the composite $t' = F^3 (\mbox{!`}) \circ t$. It is injected into the colimit as the composite $i_4 \circ t' = i_4 \circ F^3 (\mbox{!`}) \circ t$. 

This time, however, we have the commuting triangle--- the face of the cocone:
\[i_4 \circ F^3 (\mbox{!`}) = i_3 \]
which means that:
\[ i_4 \circ t' =  i_4 \circ F^3 (\mbox{!`}) \circ t =  i_3 \circ t\]
The two copies of the tree have been identified in the colimit. You can convince yourself that this procedure removes all duplicates.

\subsection{The proof}

We can  prove directly that $i = \text{Colim}\, \Gamma$ is the initial algebra. There is however one assumption that we have to make: the functor $F$ must preserve the colimits of $\omega$-chains. The colimit of $F \Gamma$ must be equal to $F i$.
\[ \text{Colim} (F \Gamma) \cong F i \]
Fortunately, this assumption holds in $\mathbf{Set}$\footnote{This is the consequence of the fact that colimits in $\Set$ are built from disjoint unions of sets.}. 

Here's the sketch of the proof: To show the isomorphism, we'll first construct an arrow $i \to F i$ and then an arrow $\iota \colon F i \to i$ in the opposite direction. We'll skip the proof that they are the inverse of each other. Then we'll show the universality of $(i, \iota)$ by constructing a catamorphism to an arbitrary algebra.

All subsequent proofs follow a simple pattern. We start with a universal cocone that defines a colimit. Then we construct another cocone based on the same chain. From universality, there must be a unique arrow from the colimit to the apex of this new cocone.

We use this trick to construct the mapping $i \to F i$. If we can construct a cocone from the chain $\Gamma$ to $\text{Colim} (F \Gamma)$ then, by universality, there must be an arrow from $i$ to $\text{Colim} (F \Gamma)$. The latter, by our assumption that $F$ preserves colimits, is isomorphic to $F i$. So we'll have a mapping $i \to F i$.

To construct this cocone, first notice that $\text{Colim} (F \Gamma$) is, by definition, the apex of a cocone $F \Gamma$. 

\[
 \begin{tikzcd}
 F 0
 \arrow[r, blue, "F \mbox{!`}"]
 \arrow[ddrr, "j_1"']
 &  F (F 0)
  \arrow[r, blue, "F(F  \mbox{!`})"]
 \arrow[ddr, "j_2"]
& F^3 0
  \arrow[r, blue, "F^3\mbox{!`}"]
  \arrow[dd, "j_3"]
 & ...
 \arrow[r]
 & F^n 0
  \arrow[r, blue]
 \arrow[ddll, "j_n"]
 & ...
 \\
 \\
 &&\text{Colim} (F \Gamma)
  \end{tikzcd}
\]
The diagram $F \Gamma$ is the same as $\Gamma$, except that it's missing the naked initial object at the start of the chain.

The spokes of the cocone we are looking for, from $\Gamma$ to $\text{Colim} (F \Gamma)$, are marked in red in the diagram below:
\[
 \begin{tikzcd}
 0
 \arrow[r, "\mbox{!`}"]
 \arrow[d, red, "\mbox{!`}"]
 & F 0
  \arrow[r, "F\mbox{!`}"]
  \arrow[d, red, "F\mbox{!`}"]
& F (F 0)
  \arrow[r, "F(F\mbox{!`})"]
  \arrow[d, red, "F(F\mbox{!`})"]
 & ...
 \arrow[r]
 & F^{n+1} 0
  \arrow[r]
  \arrow[d, red]
 & ...
 \\
 F 0
 \arrow[r, blue, "F \mbox{!`}"]
 \arrow[ddrr, red, "j_1"']
 &  F (F 0)
  \arrow[r, blue, "F(F  \mbox{!`})"]
 \arrow[ddr, red, "j_2"]
& F^3 0
  \arrow[r, blue, "F^3\mbox{!`}"]
  \arrow[dd, red, "j_3"]
 & ...
 \arrow[r, blue]
 & F^n 0
  \arrow[r, blue]
 \arrow[ddll, red, "j_n"]
 & ...
 \\
 \\
 &&\text{Colim} (F \Gamma)
  \end{tikzcd}
\]
Since $i = \text{Colim}\, \Gamma$ is the apex of the universal cocone based on $\Gamma$, there must be a unique mapping out of it to $\text{Colim} (F \Gamma)$ which, as we said, was equal to $F i$. This is the mapping we were looking for:
\[ i \to F i \]

Next, notice that the chain $F \Gamma$ is a sub-chain of $\Gamma$, so it can be embedded in it. It means that we can construct a cocone from $F \Gamma$ to the apex $i$ by going through (a sub-chain of) $\Gamma$ (the red arrows below). 

\[
 \begin{tikzcd}
 & F 0
  \arrow[r, blue, "F\mbox{!`}"]
  \arrow[d, red]
& F (F 0)
  \arrow[r, blue, "F(F\mbox{!`})"]
  \arrow[d, red]
 & ...
 \arrow[r, blue]
 & F^{n} 0
  \arrow[r, blue]
  \arrow[d, red]
 & ...
 \\
  0
 \arrow[r, "\mbox{!`}"]
 \arrow[ddrr,  "i_0"']
 &  F 0
  \arrow[r, "F  \mbox{!`}"]
 \arrow[ddr, red, "i_1"]
& F^2 0
  \arrow[r, "F^2\mbox{!`}"]
  \arrow[dd, red, "i_2"]
 & ...
 \arrow[r]
 & F^n 0
  \arrow[r]
 \arrow[ddll, red, "i_n"]
 & ...
 \\
 \\
 &&i
  \end{tikzcd}
\]

From the universality of the $\text{Colim} (F \Gamma)$ it follows that there is a mapping out 
\[\text{Colim} (F \Gamma) \to i \]
and thus we have the mapping in the other direction:
\[ \iota \colon F i \to i \]
This shows that $i$ is a carrier of an algebra. In fact, it can be shown that the two mappings are the inverse of each other, as we would expect from the Lambek's lemma.

To show that  $(i, \iota)$ is indeed the initial algebra, we have to construct a mapping out of it to an arbitrary algebra $(a, \alpha \colon F a \to a)$. Again, we can use universality, as long as we can construct a cocone from $\Gamma$ to $a$. 

\[
 \begin{tikzcd}
 0
 \arrow[r, "\mbox{!`}"]
 \arrow[ddrr, "f_0"']
 & F 0
  \arrow[r, "F\mbox{!`}"]
 \arrow[ddr, "f_1"]
& F (F 0)
  \arrow[r, "F(F\mbox{!`})"]
  \arrow[dd, "f_2"]
 & ...
 \arrow[r]
 & F^n 0
  \arrow[r]
 \arrow[ddll, "f_n"]
 & ...
 \\
 \\
 &&a
  \end{tikzcd}
\]


The zeroth spoke of this cocone goes from $0$ to $a$, so it's just $f_0 = \mbox{!`}$.

The first spoke, $F 0 \to a$, is $f_1 = \alpha \circ F f_0$, because $F f_0 \colon F 0 \to F a$ and $\alpha \colon F a \to a$.

The third spoke, $F (F 0) \to a$ is  $f_2 = \alpha \circ F f_1$.  And so on...

The unique mapping from $i$ to $a$ is then our catamorphism. With some more diagram chasing, it can be shown that it's indeed an algebra morphism. 

Notice that this construction only works if we can ``prime'' the process by creating the leaves of the functor. If, on the other hand, $F 0 \cong 0$, then there are no leaves, and all further iterations will keep reproducing $0$. 

\end{document}